13 research outputs found
Efficient Nearest Neighbors Search for Large-Scale Landmark Recognition
The problem of landmark recognition has achieved excellent results in
small-scale datasets. When dealing with large-scale retrieval, issues that were
irrelevant with small amount of data, quickly become fundamental for an
efficient retrieval phase. In particular, computational time needs to be kept
as low as possible, whilst the retrieval accuracy has to be preserved as much
as possible. In this paper we propose a novel multi-index hashing method called
Bag of Indexes (BoI) for Approximate Nearest Neighbors (ANN) search. It allows
to drastically reduce the query time and outperforms the accuracy results
compared to the state-of-the-art methods for large-scale landmark recognition.
It has been demonstrated that this family of algorithms can be applied on
different embedding techniques like VLAD and R-MAC obtaining excellent results
in very short times on different public datasets: Holidays+Flickr1M, Oxford105k
and Paris106k
An Energy Saving Road Sweeper Using Deep Vision for Garbage Detection
Road sweepers are ubiquitous machines that help preserve our cities cleanliness and health by collecting road garbage and sweeping out dirt from our streets and sidewalks. They are often very mechanical instruments, needing to operate in harsh conditions dealing with all sorts of abandoned trash and natural garbage. They are usually composed of rotating brushes, collector belts and bins, and sometimes water or air streams. All of these mechanical tools are usually high in power demand and strongly subject to wear and tear. Moreover, due to the simple working logic often implied by these cleaning machines, these tools work in an “always on”/“max power” state, and any further regulation is left to the pilot. Therefore, adding artificial intelligence able to correctly operate these tools in a semi-automatic way would be greatly beneficial. In this paper, we propose an automatic road garbage detection system, able to locate with great precision most types of road waste, and to correctly instruct a road sweeper in order to handle them. With this simple addition to an existing sweeper, we will be able to save more than 80% electrical power currently absorbed by the cleaning systems and reduce by the same amount brush weariness (prolonging their lifetime). This is done by choosing when to use the brushes and when not to, with how much strength, and where. The only hardware components needed by the system will be a camera and a PC board able to read the camera output (and communicate via CanBus). The software of the system will be mainly composed of a deep neural network for semantic segmentation of images, and a real-time software program to control the sweeper actuators with the appropriate timings. To prove the claimed results, we run extensive tests onboard of such a truck, as well as benchmark tests for accuracy, sensitivity, specificity and inference speed of the system
Automatic Generation of Semantic Parts for Face Image Synthesis
Semantic image synthesis (SIS) refers to the problem of generating realistic
imagery given a semantic segmentation mask that defines the spatial layout of
object classes. Most of the approaches in the literature, other than the
quality of the generated images, put effort in finding solutions to increase
the generation diversity in terms of style i.e. texture. However, they all
neglect a different feature, which is the possibility of manipulating the
layout provided by the mask. Currently, the only way to do so is manually by
means of graphical users interfaces. In this paper, we describe a network
architecture to address the problem of automatically manipulating or generating
the shape of object classes in semantic segmentation masks, with specific focus
on human faces. Our proposed model allows embedding the mask class-wise into a
latent space where each class embedding can be independently edited. Then, a
bi-directional LSTM block and a convolutional decoder output a new, locally
manipulated mask. We report quantitative and qualitative results on the
CelebMask-HQ dataset, which show our model can both faithfully reconstruct and
modify a segmentation mask at the class level. Also, we show our model can be
put before a SIS generator, opening the way to a fully automatic generation
control of both shape and texture. Code available at
https://github.com/TFonta/Semantic-VAE.Comment: Preprint, accepted for publication at ICIAP 202
Efficient Nearest Neighbors Search for Large-Scale Landmark Recognition
The problem of landmark recognition has achieved excellent results in small-scale datasets. Instead, when dealing with large-scale retrieval, issues that were irrelevant with small amount of data, quickly become fundamental for an efficient retrieval phase. In particular, computational time needs to be kept as low as possible, whilst the retrieval accuracy has to be preserved as much as possible. In this paper we propose a novel multi-index hashing method called Bag of Indexes (BoI) for Approximate Nearest Neighbors (ANN) search. It allows to drastically reduce the query time and outperforms the accuracy results compared to the state-of-the-art methods for large-scale landmark recognition. It has been demonstrated that this family of algorithms can be applied on different embedding techniques like VLAD and R-MAC obtaining excellent results in very short times on different public datasets: Holidays+Flickr1M, Oxford105k and Paris106k
Landmark Recognition: From Small-Scale to Large-Scale Retrieval
During the last years, the problem of landmark recognition is addressed in many different ways. Landmark recognition is related to finding the most similar images to a starting one in a particular dataset of buildings or places. This chapter explains the most used techniques for solving the problem of landmark recognition, with a specific focus on techniques based on deep learning. Firstly, the focus is on the classical approaches for the creation of descriptors used in the content-based image retrieval task. Secondly, the deep learning approach that has shown overwhelming improvements in many tasks of computer vision, is presented. A particular attention is put on the major recent breakthroughs in Content-Based Image Retrieval (CBIR), the first one is transfer learning which improves the feature representation and therefore accuracy of the retrieval system. The second one is the fine-tuning technique, that allows to highly improve the performance of the retrieval system, is presented. Finally, the chapter exposes the techniques for large-scale retrieval, in which datasets contain at least a million images
Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of Variation
Learning a disentangled representation of a distribution in a completely unsupervised way is a challenging task that has drawn attention recently. In particular, much focus has been put in separating factors of variation (i.e., attributes) within the latent code of a Generative Adversarial Network (GAN). Achieving that permits control of the presence or absence of those factors in the generated samples by simply editing a small portion of the latent code. Nevertheless, existing methods that perform very well in a noise-to-image setting often fail when dealing with a real data distribution, i.e., when the discovered attributes need to be applied to real images. However, some methods are able to extract and apply a style to a sample but struggle to maintain its content and identity, while others are not able to locally apply attributes and end up achieving only a global manipulation of the original image.
In this article, we propose a completely (i.e., truly) unsupervised method that is able to extract a disentangled set of attributes from a data distribution and apply them to new samples from the same distribution by preserving their content. This is achieved by using an image-to-image GAN that maps an image and a random set of continuous attributes to a new image that includes those attributes. Indeed, these attributes are initially unknown and they are discovered during training by maximizing the mutual information between the generated samples and the attributes’ vector. Finally, the obtained disentangled set of continuous attributes can be used to freely manipulate the input samples. We prove the effectiveness of our method over a series of datasets and show its application on various tasks, such as attribute editing, data augmentation, and style transfer